AITopics | information-theoretic confidence bound

Neural Information Processing Systems http://nips.cc/

algorithm, information gain, thompson, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Information-Theoretic Confidence Bounds for Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 07:13:41 GMT

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and confidence bounds, we obtain results that relate the per-period performance of the agent with its information gain about the environment, thus explicitly characterizing the exploration-exploitation tradeoff. The resulting cumulative regret bound depends on the agent's uncertainty over the environment and quantifies the value of prior information. We show applicability of this approach to several environments, including linear bandits, tabular MDPs, and factored MDPs. These examples demonstrate the potential of a general information-theoretic approach for the design and analysis of reinforcement learning algorithms.

information-theoretic confidence bound, name change, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Information-Theoretic Confidence Bounds for Reinforcement Learning

Xiuyuan Lu, Benjamin Van Roy

Neural Information Processing SystemsOct-2-2025, 14:56:10 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, reinforcement learning, thompson, (19 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Reviews: Information-Theoretic Confidence Bounds for Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 05:45:48 GMT

It would be great to make it more accessible to a more general audience, as the ideas it contains are fairly intuitive at their core. One suggestion would be to include illustrative figures to convey the general intuition, for example for the case of the linear-Gaussian bandit, since the confidence sets have a natural geometric interpretation in terms of the variance of the posterior. The analyses of the examples given (linear bandits, MDPs, factored MDPs) essentially all follow a recipe made possible by the results relating the confidence bounds to the regret. Specifically, they are: 1) Construct a confidence interval based on the mutual information using the characteristics of the problem at hand (linearity/Gaussian noise assumptions for the bandit, specific forms of the prior for MDPs) 2) Bound the sum of the information gain. Combining these two then gives a regret bound.

bandit, information-theoretic confidence bound, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Add feedback

Reviews: Information-Theoretic Confidence Bounds for Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 05:45:38 GMT

The paper extends Russo and Van Roy (JMRL2016) work to provide information-theoretical analysis of Thompson sampling and UCB-like algorithms in more general setting. The three reviewers acknowledge the contributions, and the potential impact of connecting information-theoretical concepts to the design of algorithms. Reviewers have suggested ways to improve the manuscript. The authors should follow these directions, and in particular fix notations, include simulation results, and provide explanations about proofs when necessary. The contributions in this paper are "methodological", i.e., it proposes a framework to analyze the regret of certain algorithms.

algorithm, information-theoretic confidence bound, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Information-Theoretic Confidence Bounds for Reinforcement Learning

Neural Information Processing SystemsOct-9-2024, 21:53:35 GMT

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and confidence bounds, we obtain results that relate the per-period performance of the agent with its information gain about the environment, thus explicitly characterizing the exploration-exploitation tradeoff. The resulting cumulative regret bound depends on the agent's uncertainty over the environment and quantifies the value of prior information. We show applicability of this approach to several environments, including linear bandits, tabular MDPs, and factored MDPs. These examples demonstrate the potential of a general information-theoretic approach for the design and analysis of reinforcement learning algorithms.

design and analysis, information-theoretic confidence bound, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Information-Theoretic Confidence Bounds for Reinforcement Learning

Lu, Xiuyuan, Roy, Benjamin Van

Neural Information Processing SystemsMar-18-2020, 21:18:08 GMT

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and confidence bounds, we obtain results that relate the per-period performance of the agent with its information gain about the environment, thus explicitly characterizing the exploration-exploitation tradeoff. The resulting cumulative regret bound depends on the agent's uncertainty over the environment and quantifies the value of prior information. We show applicability of this approach to several environments, including linear bandits, tabular MDPs, and factored MDPs. These examples demonstrate the potential of a general information-theoretic approach for the design and analysis of reinforcement learning algorithms.

design and analysis, information-theoretic confidence bound, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback